Dataset info
| Number of variables | 9 |
|---|---|
| Number of observations | 3589048 |
| Missing cells | 0 (0.0%) |
| Duplicate rows | 48 (< 0.1%) |
| Total size in memory | 246.4 MiB |
| Average record size in memory | 72.0 B |
Variables types
| Numeric | 7 |
|---|---|
| Categorical | 2 |
| Boolean | 0 |
| Date | 0 |
| URL | 0 |
| Text (Unique) | 0 |
| Rejected | 0 |
| Unsupported | 0 |
Warnings
| Dataset has 48 (< 0.1%) duplicate rows | Warning |
dropoff_datetime only contains datetime values, but is categorical. Consider applying pd.to_datetime() | Type |
dropoff_datetime has a high cardinality: 3339482 distinct values | Warning |
dropoff_latitude is highly skewed (γ1 = -26.15938649) | Skewed |
dropoff_longitude is highly skewed (γ1 = 26.19390567) | Skewed |
pickup_datetime only contains datetime values, but is categorical. Consider applying pd.to_datetime() | Type |
pickup_datetime has a high cardinality: 3340587 distinct values | Warning |
pickup_latitude is highly skewed (γ1 = -24.71643819) | Skewed |
pickup_longitude is highly skewed (γ1 = 24.74925192) | Skewed |
total_amount is highly skewed (γ1 = 1416.592082) | Skewed |
trip_distance has 51452 (1.4%) zeros | Zeros |
dropoff_datetime
Categorical
| Distinct count | 3339482 |
|---|---|
| Unique (%) | 93.0% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| 2014-08-10 14:55:38 | 31 |
|---|---|
| 2015-04-20 00:00:00 | 12 |
| 2015-02-13 00:00:00 | 11 |
| Other values (3339479) |
| Value | Count | Frequency (%) | |
| 2014-08-10 14:55:38 | 31 | < 0.1% | |
| 2015-04-20 00:00:00 | 12 | < 0.1% | |
| 2015-02-13 00:00:00 | 11 | < 0.1% | |
| 2015-03-02 00:00:00 | 9 | < 0.1% | |
| 2015-04-12 00:00:00 | 9 | < 0.1% | |
| 2015-02-08 00:00:00 | 9 | < 0.1% | |
| 2015-04-26 00:00:00 | 8 | < 0.1% | |
| 2015-03-29 00:00:00 | 8 | < 0.1% | |
| 2015-03-08 00:00:00 | 8 | < 0.1% | |
| 2015-06-01 00:00:00 | 8 | < 0.1% | |
| Other values (3339472) | 3588935 | > 99.9% |
| Max length | 19 |
|---|---|
| Mean length | 19 |
| Min length | 19 |
| Contains chars | False |
| Contains digits | True |
| Contains spaces | True |
| Contains non-words | True |
dropoff_latitude
Numeric
| Distinct count | 88454 |
|---|---|
| Unique (%) | 2.5% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 40.69178856 |
|---|---|
| Minimum | 0 |
| Maximum | 43.16053772 |
| Zeros (%) | 0.1% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 40.66179276 |
| Q1 | 40.70624542 |
| Median | 40.7504425 |
| Q3 | 40.79576874 |
| 95-th percentile | 40.84837341 |
| Maximum | 43.16053772 |
| Range | 43.16053772 |
| Interquartile range | 0.0895233154 |
Descriptive statistics
| Standard deviation | 1.550997685 |
|---|---|
| Coef of variation | 0.03811574127 |
| Kurtosis | 683.3172003 |
| Mean | 40.69178856 |
| MAD | 0.1265361092 |
| Skewness | -26.15938649 |
| Sum | 146044782.3 |
| Variance | 2.405593818 |
| Memory size | 27.4 MiB |
| Value | Count | Frequency (%) | |
| 0 | 5199 | 0.1% | |
| 40.77432632 | 327 | < 0.1% | |
| 40.77430344 | 324 | < 0.1% | |
| 40.80513382 | 321 | < 0.1% | |
| 40.80515671 | 305 | < 0.1% | |
| 40.80512619 | 302 | < 0.1% | |
| 40.80511856 | 302 | < 0.1% | |
| 40.77428818 | 299 | < 0.1% | |
| 40.7743187 | 297 | < 0.1% | |
| 40.80514145 | 297 | < 0.1% | |
| Other values (88444) | 3581075 | 99.8% |
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0 | 5199 | 0.1% | |
| 29.57538986 | 1 | < 0.1% | |
| 29.59758186 | 1 | < 0.1% | |
| 36.09853363 | 1 | < 0.1% | |
| 37.74568176 | 1 | < 0.1% |
Maximum 5 values
| Value | Count | Frequency (%) | |
| 43.16053772 | 1 | < 0.1% | |
| 42.89497757 | 1 | < 0.1% | |
| 42.76523972 | 1 | < 0.1% | |
| 42.7485733 | 1 | < 0.1% | |
| 42.67972565 | 1 | < 0.1% |
dropoff_longitude
Numeric
| Distinct count | 45682 |
|---|---|
| Unique (%) | 1.3% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | -73.82657599 |
|---|---|
| Minimum | -122.3996277 |
| Maximum | 0 |
| Zeros (%) | 0.1% |
Quantile statistics
| Minimum | -122.3996277 |
|---|---|
| 5-th percentile | -73.99716949 |
| Q1 | -73.96717834 |
| Median | -73.94400024 |
| Q3 | -73.90833092 |
| 95-th percentile | -73.83304596 |
| Maximum | 0 |
| Range | 122.3996277 |
| Interquartile range | 0.05884742735 |
Descriptive statistics
| Standard deviation | 2.812673762 |
|---|---|
| Coef of variation | -0.03809839106 |
| Kurtosis | 684.6239492 |
| Mean | -73.82657599 |
| MAD | 0.2166120365 |
| Skewness | 26.19390567 |
| Sum | -264967124.9 |
| Variance | 7.911133694 |
| Memory size | 27.4 MiB |
| Value | Count | Frequency (%) | |
| 0 | 5199 | 0.1% | |
| -73.95274353 | 629 | < 0.1% | |
| -73.93916321 | 571 | < 0.1% | |
| -73.95272827 | 567 | < 0.1% | |
| -73.95276642 | 565 | < 0.1% | |
| -73.9393158 | 544 | < 0.1% | |
| -73.95278931 | 534 | < 0.1% | |
| -73.93917847 | 531 | < 0.1% | |
| -73.93932343 | 528 | < 0.1% | |
| -73.93914032 | 527 | < 0.1% | |
| Other values (45672) | 3578853 | 99.7% |
Minimum 5 values
| Value | Count | Frequency (%) | |
| -122.3996277 | 1 | < 0.1% | |
| -115.1458664 | 1 | < 0.1% | |
| -84.55179596 | 1 | < 0.1% | |
| -83.42977142 | 1 | < 0.1% | |
| -81.24718475 | 1 | < 0.1% |
Maximum 5 values
| Value | Count | Frequency (%) | |
| 0 | 5199 | 0.1% | |
| -28.18333244 | 1 | < 0.1% | |
| -70.91316986 | 1 | < 0.1% | |
| -70.91697693 | 1 | < 0.1% | |
| -70.92022705 | 1 | < 0.1% |
passenger_count
Numeric
| Distinct count | 10 |
|---|---|
| Unique (%) | < 0.1% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 1.404418665 |
|---|---|
| Minimum | 0 |
| Maximum | 9 |
| Zeros (%) | < 0.1% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| Median | 1 |
| Q3 | 1 |
| 95-th percentile | 5 |
| Maximum | 9 |
| Range | 9 |
| Interquartile range | 0 |
Descriptive statistics
| Standard deviation | 1.094671723 |
|---|---|
| Coef of variation | 0.7794482874 |
| Kurtosis | 7.721451287 |
| Mean | 1.404418665 |
| MAD | 0.6746373892 |
| Skewness | 2.946986301 |
| Sum | 5040526 |
| Variance | 1.198306181 |
| Memory size | 27.4 MiB |
| Value | Count | Frequency (%) | |
| 1 | 2990459 | 83.3% | |
| 2 | 266574 | 7.4% | |
| 5 | 159314 | 4.4% | |
| 3 | 86323 | 2.4% | |
| 6 | 59417 | 1.7% | |
| 4 | 25901 | 0.7% | |
| 0 | 894 | < 0.1% | |
| 8 | 80 | < 0.1% | |
| 7 | 70 | < 0.1% | |
| 9 | 16 | < 0.1% |
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0 | 894 | < 0.1% | |
| 1 | 2990459 | 83.3% | |
| 2 | 266574 | 7.4% | |
| 3 | 86323 | 2.4% | |
| 4 | 25901 | 0.7% |
Maximum 5 values
| Value | Count | Frequency (%) | |
| 9 | 16 | < 0.1% | |
| 8 | 80 | < 0.1% | |
| 7 | 70 | < 0.1% | |
| 6 | 59417 | 1.7% | |
| 5 | 159314 | 4.4% |
pickup_datetime
Categorical
| Distinct count | 3340587 |
|---|---|
| Unique (%) | 93.1% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| 2014-08-09 15:54:25 | 31 |
|---|---|
| 2014-06-07 00:00:00 | 17 |
| 2014-07-05 00:00:00 | 13 |
| Other values (3340584) |
| Value | Count | Frequency (%) | |
| 2014-08-09 15:54:25 | 31 | < 0.1% | |
| 2014-06-07 00:00:00 | 17 | < 0.1% | |
| 2014-07-05 00:00:00 | 13 | < 0.1% | |
| 2014-07-12 00:00:00 | 13 | < 0.1% | |
| 2014-06-06 00:00:00 | 13 | < 0.1% | |
| 2014-06-22 00:00:00 | 12 | < 0.1% | |
| 2014-06-01 00:00:00 | 12 | < 0.1% | |
| 2014-04-30 00:00:00 | 11 | < 0.1% | |
| 2014-05-29 00:00:00 | 11 | < 0.1% | |
| 2014-05-30 00:00:00 | 11 | < 0.1% | |
| Other values (3340577) | 3588904 | > 99.9% |
| Max length | 19 |
|---|---|
| Mean length | 19 |
| Min length | 19 |
| Contains chars | False |
| Contains digits | True |
| Contains spaces | True |
| Contains non-words | True |
pickup_latitude
Numeric
| Distinct count | 77041 |
|---|---|
| Unique (%) | 2.1% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 40.68698907 |
|---|---|
| Minimum | 0 |
| Maximum | 42.78691864 |
| Zeros (%) | 0.2% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 40.67240906 |
| Q1 | 40.70282364 |
| Median | 40.74771118 |
| Q3 | 40.8049202 |
| 95-th percentile | 40.8453331 |
| Maximum | 42.78691864 |
| Range | 42.78691864 |
| Interquartile range | 0.1020965576 |
Descriptive statistics
| Standard deviation | 1.641373779 |
|---|---|
| Coef of variation | 0.04034149041 |
| Kurtosis | 609.6833054 |
| Mean | 40.68698907 |
| MAD | 0.1367990944 |
| Skewness | -24.71643819 |
| Sum | 146027556.7 |
| Variance | 2.694107883 |
| Memory size | 27.4 MiB |
| Value | Count | Frequency (%) | |
| 0 | 5824 | 0.2% | |
| 40.72135162 | 1504 | < 0.1% | |
| 40.72133636 | 1379 | < 0.1% | |
| 40.72136688 | 1339 | < 0.1% | |
| 40.72135544 | 1237 | < 0.1% | |
| 40.72137833 | 1213 | < 0.1% | |
| 40.72134018 | 1153 | < 0.1% | |
| 40.7213707 | 1082 | < 0.1% | |
| 40.72132874 | 1081 | < 0.1% | |
| 40.72134781 | 1032 | < 0.1% | |
| Other values (77031) | 3572204 | 99.5% |
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0 | 5824 | 0.2% | |
| 29.58062363 | 1 | < 0.1% | |
| 29.60992432 | 1 | < 0.1% | |
| 36.08561325 | 1 | < 0.1% | |
| 37.74568176 | 1 | < 0.1% |
Maximum 5 values
| Value | Count | Frequency (%) | |
| 42.78691864 | 1 | < 0.1% | |
| 42.74856567 | 1 | < 0.1% | |
| 42.67842865 | 1 | < 0.1% | |
| 42.66678238 | 1 | < 0.1% | |
| 42.64542389 | 1 | < 0.1% |
pickup_longitude
Numeric
| Distinct count | 36181 |
|---|---|
| Unique (%) | 1.0% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | -73.81414566 |
|---|---|
| Minimum | -122.3996201 |
| Maximum | 0 |
| Zeros (%) | 0.2% |
Quantile statistics
| Minimum | -122.3996201 |
|---|---|
| 5-th percentile | -73.9907074 |
| Q1 | -73.95853424 |
| Median | -73.94424438 |
| Q3 | -73.91512299 |
| 95-th percentile | -73.84429932 |
| Maximum | 0 |
| Range | 122.3996201 |
| Interquartile range | 0.0434112549 |
Descriptive statistics
| Standard deviation | 2.97637456 |
|---|---|
| Coef of variation | -0.04032254973 |
| Kurtosis | 610.8643406 |
| Mean | -73.81414566 |
| MAD | 0.239971874 |
| Skewness | 24.74925192 |
| Sum | -264922511.9 |
| Variance | 8.858805519 |
| Memory size | 27.4 MiB |
| Value | Count | Frequency (%) | |
| 0 | 5824 | 0.2% | |
| -73.84429932 | 2283 | 0.1% | |
| -73.84429169 | 2097 | 0.1% | |
| -73.84427643 | 2089 | 0.1% | |
| -73.8442688 | 1900 | 0.1% | |
| -73.84430695 | 1888 | 0.1% | |
| -73.84428406 | 1832 | 0.1% | |
| -73.84431458 | 1736 | < 0.1% | |
| -73.8443222 | 1625 | < 0.1% | |
| -73.84425354 | 1567 | < 0.1% | |
| Other values (36171) | 3566207 | 99.4% |
Minimum 5 values
| Value | Count | Frequency (%) | |
| -122.3996201 | 1 | < 0.1% | |
| -115.1502762 | 1 | < 0.1% | |
| -84.55180359 | 1 | < 0.1% | |
| -83.42976379 | 1 | < 0.1% | |
| -81.2538681 | 1 | < 0.1% |
Maximum 5 values
| Value | Count | Frequency (%) | |
| 0 | 5824 | 0.2% | |
| -70.91651154 | 1 | < 0.1% | |
| -70.91851044 | 1 | < 0.1% | |
| -70.95729065 | 1 | < 0.1% | |
| -71.07021332 | 1 | < 0.1% |
total_amount
Numeric
| Distinct count | 8196 |
|---|---|
| Unique (%) | 0.2% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 14.78220803 |
|---|---|
| Minimum | -350 |
| Maximum | 51192 |
| Zeros (%) | 0.3% |
Quantile statistics
| Minimum | -350 |
|---|---|
| 5-th percentile | 5.3 |
| Q1 | 7.8 |
| Median | 11.3 |
| Q3 | 18 |
| 95-th percentile | 35 |
| Maximum | 51192 |
| Range | 51542 |
| Interquartile range | 10.2 |
Descriptive statistics
| Standard deviation | 29.77613526 |
|---|---|
| Coef of variation | 2.01432257 |
| Kurtosis | 2431553.483 |
| Mean | 14.78220803 |
| MAD | 7.542221629 |
| Skewness | 1416.592082 |
| Sum | 53054054.16 |
| Variance | 886.6182311 |
| Memory size | 27.4 MiB |
| Value | Count | Frequency (%) | |
| 8.3 | 74120 | 2.1% | |
| 7.8 | 72106 | 2.0% | |
| 6.8 | 72098 | 2.0% | |
| 7.3 | 70554 | 2.0% | |
| 8 | 69240 | 1.9% | |
| 7 | 67851 | 1.9% | |
| 6.3 | 64537 | 1.8% | |
| 6.5 | 61922 | 1.7% | |
| 8.8 | 60677 | 1.7% | |
| 6 | 57694 | 1.6% | |
| Other values (8186) | 2918249 | 81.3% |
Minimum 5 values
| Value | Count | Frequency (%) | |
| -350 | 1 | < 0.1% | |
| -300 | 1 | < 0.1% | |
| -259.33 | 1 | < 0.1% | |
| -250.8 | 1 | < 0.1% | |
| -250 | 2 | < 0.1% |
Maximum 5 values
| Value | Count | Frequency (%) | |
| 51192 | 1 | < 0.1% | |
| 4035.46 | 1 | < 0.1% | |
| 3352.5 | 1 | < 0.1% | |
| 2665.5 | 1 | < 0.1% | |
| 1229.8 | 1 | < 0.1% |
trip_distance
Numeric
| Distinct count | 3614 |
|---|---|
| Unique (%) | 0.1% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 2.94962022 |
|---|---|
| Minimum | 0 |
| Maximum | 439.53 |
| Zeros (%) | 1.4% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.48 |
| Q1 | 1.1 |
| Median | 1.99 |
| Q3 | 3.78 |
| 95-th percentile | 8.57 |
| Maximum | 439.53 |
| Range | 439.53 |
| Interquartile range | 2.68 |
Descriptive statistics
| Standard deviation | 2.980152093 |
|---|---|
| Coef of variation | 1.01035112 |
| Kurtosis | 244.8738648 |
| Mean | 2.94962022 |
| MAD | 2.026236885 |
| Skewness | 4.863721805 |
| Sum | 10586328.55 |
| Variance | 8.881306498 |
| Memory size | 27.4 MiB |
| Value | Count | Frequency (%) | |
| 0 | 51452 | 1.4% | |
| 0.9 | 40130 | 1.1% | |
| 1 | 39985 | 1.1% | |
| 0.8 | 38819 | 1.1% | |
| 1.1 | 38440 | 1.1% | |
| 1.2 | 36676 | 1.0% | |
| 1.3 | 34837 | 1.0% | |
| 0.7 | 34739 | 1.0% | |
| 1.4 | 32783 | 0.9% | |
| 1.5 | 30834 | 0.9% | |
| Other values (3604) | 3210353 | 89.4% |
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0 | 51452 | 1.4% | |
| 0.01 | 3219 | 0.1% | |
| 0.02 | 2419 | 0.1% | |
| 0.03 | 2126 | 0.1% | |
| 0.04 | 1816 | 0.1% |
Maximum 5 values
| Value | Count | Frequency (%) | |
| 439.53 | 1 | < 0.1% | |
| 375.64 | 1 | < 0.1% | |
| 250.24 | 1 | < 0.1% | |
| 176.53 | 1 | < 0.1% | |
| 146.27 | 1 | < 0.1% |
First rows
| dropoff_datetime | dropoff_latitude | dropoff_longitude | passenger_count | pickup_datetime | pickup_latitude | pickup_longitude | total_amount | trip_distance | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015-02-01 01:49:58 | 40.728386 | -73.984764 | 1 | 2015-02-01 01:26:45 | 40.811172 | -73.953545 | 27.80 | 8.11 |
| 1 | 2015-01-02 20:14:04 | 40.711475 | -73.961571 | 1 | 2015-01-02 20:06:28 | 40.714321 | -73.946709 | 9.80 | 1.29 |
| 2 | 2014-09-27 18:19:56 | 40.777813 | -73.947304 | 5 | 2014-09-27 17:55:38 | 40.718094 | -73.957626 | 26.30 | 6.12 |
| 3 | 2014-04-27 02:39:02 | 40.718582 | -73.987785 | 2 | 2014-04-27 02:27:04 | 40.713997 | -73.949501 | 17.30 | 3.68 |
| 4 | 2014-05-26 18:44:13 | 40.664013 | -73.977325 | 1 | 2014-05-26 18:32:19 | 40.672195 | -73.944092 | 11.50 | 2.40 |
| 5 | 2015-03-04 21:43:47 | 40.812088 | -73.944008 | 1 | 2015-03-04 21:36:48 | 40.804962 | -73.954826 | 9.36 | 1.16 |
| 6 | 2015-01-21 09:51:01 | 40.656422 | -73.865089 | 1 | 2015-01-21 09:27:41 | 40.730438 | -73.862000 | 25.80 | 7.50 |
| 7 | 2015-03-07 19:20:49 | 40.635708 | -74.009293 | 6 | 2015-03-07 18:51:58 | 40.675697 | -73.971947 | 21.30 | 4.58 |
| 8 | 2015-01-11 17:04:26 | 40.684875 | -73.923279 | 1 | 2015-01-11 16:55:04 | 40.681896 | -73.949883 | 8.80 | 1.51 |
| 9 | 2014-05-30 06:00:00 | 40.781448 | -73.949173 | 1 | 2014-05-30 05:53:15 | 40.789875 | -73.952370 | 7.50 | 1.20 |
Last rows
| dropoff_datetime | dropoff_latitude | dropoff_longitude | passenger_count | pickup_datetime | pickup_latitude | pickup_longitude | total_amount | trip_distance | |
|---|---|---|---|---|---|---|---|---|---|
| 3589038 | 2014-05-05 16:56:48 | 40.708706 | -73.800034 | 3 | 2014-05-05 16:51:38 | 40.716911 | -73.803459 | 6.50 | 0.70 |
| 3589039 | 2015-01-12 21:22:43 | 40.583813 | -73.937790 | 1 | 2015-01-12 21:17:27 | 40.587360 | -73.953857 | 9.10 | 1.00 |
| 3589040 | 2014-04-23 09:58:11 | 40.789410 | -73.952705 | 1 | 2014-04-23 09:45:48 | 40.805065 | -73.939651 | 12.38 | 1.84 |
| 3589041 | 2015-04-18 21:09:33 | 40.843395 | -73.905083 | 1 | 2015-04-18 20:57:46 | 40.814789 | -73.914703 | 11.80 | 2.43 |
| 3589042 | 2014-06-10 21:56:17 | 40.674210 | -73.967087 | 2 | 2014-06-10 21:39:48 | 40.650593 | -74.004562 | 15.50 | 3.50 |
| 3589043 | 2015-05-30 01:15:53 | 40.680450 | -73.956551 | 1 | 2015-05-30 01:03:53 | 40.680923 | -73.977394 | 11.30 | 1.95 |
| 3589044 | 2015-06-28 13:07:06 | 40.688084 | -73.995346 | 1 | 2015-06-28 12:55:48 | 40.679119 | -73.999702 | 9.30 | 0.81 |
| 3589045 | 2015-05-02 09:51:01 | 40.793259 | -73.951996 | 1 | 2015-05-02 09:45:50 | 40.807251 | -73.945900 | 6.80 | 1.19 |
| 3589046 | 2014-08-11 14:23:52 | 40.823505 | -73.941353 | 1 | 2014-08-11 14:10:03 | 40.795506 | -73.941856 | 15.00 | 2.60 |
| 3589047 | 2015-06-10 18:27:00 | 40.671173 | -73.974541 | 2 | 2015-06-10 18:21:15 | 40.676311 | -73.962761 | 7.80 | 1.00 |